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Abstract 

The problem of classical data compression when the decoder has quantum side information 
at his disposal is considered. This is a quantum generalization of the classical Slepian-Wolf 
theorem. The optimal compression rate is found to be reduced from the Shannon entropy of 
the source by the Holevo information between the source and side information. 

Generalizing classical information theory to the quantum setting has had varying success de- 
pending on the type of problem considered. Quantum problems hitherto solved (in the asymptotic 
sense of Shannon theory) may be divided into three classes. The first comprises pure bipartite 
entanglement manipulation, such as Schumacher compression and entanglement concentra- 
tion/dilution 121 El 131 • Their tractability is due to the formal similarities between a pair of perfectly 
correlated random variables and the Schmidt decomposition of bipartite quantum states. 

The second, and largest, is the class of "hybrid" classical-quantum problems, where only a 
subset (usually of size one) of the terminals in the problem is quantum and the others are classical. 
The simplest example is the Holevo-Schumacher- Westmoreland (HSW) theorem 0, which deals 
with the capacity of a classical quantum channel (abbreviated {c — > q}; see ^). This carries 
over to the multiterminal case involving many classical senders and one quantum receiver . Then 
we have Winter's measurement compression theorem and remote state preparation |Ul llll[TU| . 
These two may be thought of as simulating quantum — > classical ({g — > c}) and {c — + q\ channels, 
respectively. Another recent discovery has been quantum data compression with classical side- 
information available to both the encoder and decoder ^21 ) generalizing the rate-entropy curve of 
ITT] to arbitrary pure state ensembles. 

The third class is that of fully quantum communication problems, such as the entanglement- 
assisted capacity theorem |13| and its reverse - that of simulating quantum channels in the presence 
of entanglement |14j . These rely on methods of {c — > q\ channel coding combined with super-dense 
coding (15) and {q — > c} channel simulation combined with quantum teleportation respectively. 
A recent addition to this class has been the long awaited proof of the channel capacity theorem 
^lEl, which also relies on classical-quantum methods. 

The problem addressed here belongs to the second class and concerns classical data compression 
when the decoder has quantum side information at his disposal. We shall refer to it as the classical- 
quantum Slepian-Wolf (CQSW) problem in analogy to its classical counterpart ^HI- We begin 
by introducing the notion of a bipartite classical- quantum system. The fully classical and fully 
quantum analogues are familiar concepts. The former is is embodied in a pair of correlated random 
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variables XY, associated with the product set X x y and a probabiUty distribution p{x,y) — 
Pi{X ^ x,Y ~ y} defined on A" x 3^. The latter is a bipartite quantum system AB, associated 
with a product Hilbert space Ha ^ Hb and a density operator p-^'^ , the "quantum state" of the 
system AB, defined on Ha ^ Hb- The state of a classical-quantum system is now described 
by an ensemble £ — {pxtP{x)}, with p{x) defined on X and the px being density operators on the 
Hilbert space Hq of Q. Thus, with probability p{x) the classical index and quantum state take on 
values X and px, respectively. Such correlations may come about, for example, when Bob holds the 
purification of a state Alice is measuring. Indeed, let Alice and Bob initially share the quantum 
state (in Schmidt polar form) 

\'^)ab ^^^/rl\i)A\i)B 

i 

with local density matrix p = ri\i){i\ on either side. Upon performing a POVM on A, defined 
by the positive operators {A^,} with A^ = 1, Alice holds a random variable X correlated with 
Bob's quantum system B. Moreover, according to [5n|, the ensemble oi XB is given by {px,p{x)}, 
where 

p{x) = Tr(pA^), 

and * denotes complex conjugation in the basis. 

A useful representation of classical-quantum systems, which we refer to as the "enlarged Hilbert 
space" (EHS) representation, is obtained by embedding the random variable X in some quantum 
system A. Then our ensemble {pxTp{xy\ corresponds to the density operator 

p^^ = Y,p{x)\x){x\-^®p^, (2) 

X 

where {|a;) : x & X} is an orthonormal basis for the Hilbert space Ha of A. A classical-quantum 
system may, therefore, be viewed as a special case of a quantum one. The EHS representation 
is convenient for defining various information theoretical quantities for classical-quantum systems. 
The von Neumann entropy of a quantum system A with density operator p-^ is defined as H{A) = 
— Tr p-^ logp-^. For a bipartite quantum system AB define the conditional von Neumann entropy 

H{B\A)^H{AB)-H{A), 

and quantum mutual information 

I{A:, B) = H{A) + H{B) - H{AB) = H{B) - H{B\A), 

in formal analogy with the classical definitions. Notice that for classical-quantum correlations 
10) the von Neumann entropy H{A) is just the Shannon entropy H{X) = ~ -P(^) log of 
X. The conditional entropy H{Q\X) is defined as H{Q\A) and equals J2xPi^)^iP'^)- Similarly, 
the mutual information of AT Q is defined as I{X; Q) = I {A; Q). Notice that this is precisely the 
familiar Holevo information ^ of the ensemble £: 



x{£) = h(j2p('^)p^) -E 

\ X / X 



p{x)H{px). 



Returning to the formulation of the CQSW problem, suppose Alice and Bob share a large 
number n copies of the classical-quantum system XQ. Alice possesses knowledge of the index 

= xiX2 ■ ■ ■ Xn, but not the quantum system locally described by px^ = Pxi ® Px2 ■ ■ ■ ® Px„', 
Bob has the quantum system at his disposal but not the classical index. Note that this does 
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not necessarily imply that Alice can prepare a replica of Bob's state in a way that preserves its 
entanglement with other systems. Alice wishes to convey the information contained in the index 
to Bob almost perfectly, using a minimal amount of classical communication. If Bob didn't have 
the quantum information, she would need to send ~ nH{X) classical bits. The question is: can 
they reduce the communication cost by making use of Bob's quantum information? To consider a 
trivial example, the members of the ensemble could be mutually orthogonal. Then Bob would be 
able to perfectly distinguish among them by performing an appropriate measurement, requiring no 
classical communication whatsoever. An intermediate case is when XQ\s given by the BB84 pi] 
ensemble 5bb84- Taking {|0), |1)} to be the standard qubit basis, let |±) = :^(|0) ± |1)). fBB84 
assigns a probability of \ to each of |0), |+) and |— ), so that 2 bits are required to describe Alice's 
classical data. However, she needs to send only 1 bit indicating the basis {|0), |1)} or {|+), |— )} in 
which Bob should perform his measurement. The measurement unambiguously reveals the identity 
of the chosen state without disturbing it. This example is a one-shot paradigm for the general case. 
A single copy of a general XQ does not have this property of being decomposable into subensembles 
with mutually orthogonal elements. However, the block = X1Q1X2Q2 ■ ■ ■ XnQm consisting 

of a large number of copies of XQ, does satisfy this condition approximately. Since the problem is 
formulated as an asymptotic and approximate one, this will suffice for our purposes. We shall show 
that Alice may reduce her communication cost by at most « nI{X\ Q), and describe a protocol 
that achieves this. We proceed to formally define the coding procedure. An (n, e) CQSW code 
consists of 

• a mapping / : A"" -> [M], [M] = {1, 2, . . . , Af}, M = 2"^, by which Alice encodes her 
classical message X" into the index / — /(AT"); 

• a set {A(i), A(2), . . . , A^*^)}, where each A^') = {A^'^} is a POVM acting on H®" and taking 
values j G [N] ; 

• a decoding map g : [M] x [N] X" that provides Bob with an estimate A" = g{I, J) of A" 
based on / and the outcome J of the POVM A^^' apphed to Bob's quantum system Q". 

The rate R signifies the number of bits per copy needed to encode the index /. The error 
probability is required to be bounded 



Denoting Bob's residual state after the extraction of the classical information by p^n , its disturbance 
with respect to px" must also be small on average 



A rate R is said to be achievable if for any e,S > and all sufficiently large n there exists an (n, e) 
code of rate R + S. Our main result (which first appeared in is the following theorem. 

Theorem 1 (CQSW Theorem) Given a classical-quantum system XQ, a rate R is achievable iff 



The "if" part of the proof is called the direct coding theorem, and the "only if" part is called the 
converse. 

Let us first compare our result to the classical Slepian-Wolf problem. The latter is usually 
formulated as a three terminal problem. We are given two correlated sources described by the 
random variables X and Y, known to Alice and Bob, respectively. They encode their sources 
separately and send them to Charlie at rates Ri and R2, respectively, who decodes them jointly 
with the aim to faithfully reconstruct X and Y. One may now ask about the achievable rate region 



Pe = Pr{A" ^ A"} < e. 




(3) 



R> H{X) - I{X-Q) ^ H{X\Q). 
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Figure 1: The achievable rate region for the classical Slepian-Wolf problem. 

i?2)- The answer is given by 

Ri > H{X\Y) 
R2 > H{Y\X) 
R1+R2 > H{XY), 

as shown in figure 1. It suffices to show the achievability of the points {H{X), H{Y\X)) and 
{H{X\Y), H{Y)), since the rest of the region follows by time sharing (merging codes of length 
i/n and (1 — J')n, 1/ G [0,1], corresponding the two points, respectively). The obvious classical- 
quantum generalization of this result would be to replace y by a quantum system Q, and the joint 
distribution of XY by the ensemble state 

p-^Q ^Y.p{x,y)\x){x\®Try, (4) 

where iTy are density operators on Q, which for the sake of this discussion we assume to be pure. 
Observe that the state written here has the same form as in with 

P{x) = ^p{x,y), 
y 

But here the description also contains the decomposition of into pure states, i.e., a chosen 
ensemble. 

The task of coding is, analogously to Schumacher's theorem, to enable Charlie to reconstruct 
|x")(a;"| ® TTyn with high average fidelity, in a situation of many independent realizations of p-^^. 
Indeed, Theorem 1 implies the achievability of the point {H{X\Q), H{Q)). Bob may Schumacher 
compress his quantum system and send it to Charlie at a qubit rate of i?2 = H{Q). The latter 
uses it as quantum side information, and Alice needs to send classical information to Charlie at a 
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bit rate of i?i = H{X\Q). Furthermore, after having used the quantum system for this purpose, 
according to JSJ it wiU remain basicaUy intact. (Note that our proof of the direct coding theorem 
below actuaUy shows that even the average disturbance of the TTyn is smah - in fact the decoder is 
such that it causes Httle disturbance to purifications of the Px"-) 

As for the other point {H{X), H{Q\X)), we do not know to which extent the classical result 
carries over. There are, as in the above discussion, trivial examples where it is achievable. One 
example is perfect correlation, when p(x, y) 7^ iff a; = y: then, knowing x one can perfectly 
reconstruct the pure state of Q because it has to be tt^- So, Ri = H{X), i?2 = is achievable. 
Another is when X can be read off Q, i.e. when the states iTy fall into mutually orthogonal classes 
yx such that p{x,y) ^ implies % S 3^a;. Then Alice can Shannon compress her x", and Bob, 
since he can read x" on his system, can Schumacher compress to a rate H{Q\X) (compare and 
dl). 

Notice that there are two variants to the coding problem here: blind (where Bob has to operate 
on the TTy), and visible (where he is told y). Note that the labeling of the different ensembles for the 
Px by the same set y is purely artificial - this is why there is more than one visible coding problem 
associated to the same ensemble. In particular, we cannot expect the answers to the visible and 
to the blind problem to be the same. Both however are open problems. 

Proof of Theorem 1 (converse) We need to prove that, for any S,e > and sufficiently large 
n, if an (n, e) code has rate R then R > H{X\Q) — 5. Without loss of generality, e < and 
n > 2/6. We shall make use of two inequalities. The first is the Holevo bound [21) according to 
which the amount of information about X" extractable from the quantum system Q" is bounded 
from above by /(X"; Q") = nI{X; Q). Recall that Bob makes an estimate X" = g{I, J) of X" 
based on / = /(AT") and the measurement outcome J. Our second ingredient is Fano's inequality 
EH]: 

H{X-^\IJ) < h2{Pe) + Pe logdA-l" - 1). 

Here /i2(p) = —p^ogp — (1 — p) log(l — p) is the binary entropy. This inequality is interpreted as: 
Given IJ one can specify X" by saying whether or not it is equal to g(I, J) and, conditionally 
upon a negative answer, specifying which of the remaining jA"!" — 1 values it has taken. We have 

nR + nI{X- Q) 

> i?(/)+/(X"; J) 

= H(X") + H{I\X''J) + J) - H{X"\IJ) 

> nH{X)- H{X''\IJ) 

> n(^H{X)-^-e\og\X\^. 

The first inequality follows trivially from / e [2"^] and Holevo's theorem. The second comes from 
the non-negativity of mutual information and conditional entropy. The final one is a consequence 
of Fano's inequahty. Thus R > H{X) — I{X; Q) — 5, as claimed. ■ 

Remark An alternative way to demonstrate the converse uses a recent result on remote state 
preparation JHIi according to which Alice and Bob may establish classical-quantum correlations 
XQ with asymptotically perfect fidelity using shared entanglement and forward classical commu- 
nication at rates of H{Q) ebits and I{X; Q) bits, respectively. Let us assume that the converse 
fails, i.e. that it is possible to achieve a CQSW rate R < H{X\Q). Then with the help of shared 
entanglement she would be able to convey X at a classical rate strictly less than H{X), by first 
remotely preparing the quantum information then performing the CQSW protocol. We know, 
however, that entanglement can in no way increase the capacity of a classical channel, e.g. by |13j. 
Remark Note that the lower bound in Theorem 1 holds true even for CQSW codes which 
disregard condition Q : We invite the reader to confirm that in the proof of the converse it was 
never used. 
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Before launching into the proof of achievabihty we give a heuristic argument. Let us recaU 
typical sequences (see for an extensive discussion) and subspaces P and their properties. The 
theorem of typical sequences states that given random variable X defined on a set X and with 
probability distribution p(a;), for any e, (5 > and sufficiently large n > nQ{\X\,e,S) there exists a 
typical set Tx.s C A"" of sequences x" such that 

2n[H{X)-S] < < 2"[-f^(-^)+'51, 

and Pr{X" e Tx,s} > 1 — £• Typical sequences are those in which the fraction of a given letter x is 
approximated by its probability p{x) , and the law of large numbers guarantees that such sequences 
will occur with high probability. Thus one need worry only about encoding typical sequences. The 
quantum analogue of the typical set is the typical subspace 7q^s of 7i®", defined for a quantum 
system Q with d-dimensional Hilbert space Ti. and in a quantum state p. It satisfies 

2"WS)-5] <dimrQ,5<2"[^(2)+*l, 

and Tr(p^"ng_5) > 1 — e, where Hq^s is the projector onto Tq s- Finally, for a classical quantum 
system XQ and a particular sequence a;" e Tx,s we define the conditionally typical subspace 
Tq^x,5{x^) in the following way. The Hilbert space T^"^" can be decomposed into a tensor product 
Hx with Hx collecting all the factors k such that Xk = x. Then the conditionally typical 
subspace is the tensor product of the typical subspaces of the Hx with respect to px ■ It follows 
that 

2n[mQ\X)-K5] < dimrQ|x,5(x") < 2"[^(SI^)+^*1 , 

for some constant K. At the same time Ti {px^UQ^x.six^ j) > 1 — \X\e, where IIq\x,s{x^) is 
the projector onto Tq\x,s{x"). The latter means that the trace decreasing measurement given by 
^Q\x,six") will succeed with high probability when applied to the state px^. One would like to 
construct a POVM out of such conditionally typical projectors for different x" belonging to some 
set C, in order to distinguish between them. Since the TQ\x,six") are approximately contained in 
Tq^s |22]> the task is, roughly speaking, to "pack" the Tg|x,5(x"), x" G C into the typical subspace 
Tq^s- The former have dimension = 2"^*^^l"^-' and the latter has dimension = 2"^*^^-', hence one 
expects \C\ to be at most = 2"[^(2)--f^(2l^)l = 2"-f(^'fi). This is the basic content of the HSW, 
or {c — > q} channel coding, theorem jSj, although the actual POVM construction is rather more 
subtle. Accordingly, C is called a channel code. Here we take one step further and ask about the 
minimal number of disjoint channel codes that "cover" the typical input set Tx,s- The size of Tx,5 
is = 2"--^^'^\ so the number of codes needed should be = 2"'^^^^^'-^^'^'^^^ Now Alice need only 
send information about which code her source sequence x" belongs to, and Bob can perform the 
appropriate measurement to distinguish it from the other sequences in that code, as in the one-shot 
BB84 example. The described construction is depicted in figure 2. 

To prove Theorem 1 we shall need some background on channel codes. For a given classical- 
quantum system XQ, a, channel code C is a subset of A"", associated with a POVM A = {Ax^ : x" G 
C} acting on 7i®". The rate of the channel code is defined as r = i log \C\. The error probability 
of a given x" € C is Pe{x") = 1 — Ti {px'^ Ax^^) ■ C is called an (n, e) code if max^^^gc Pe(a;") < e. 
We shall need the following version of the {c — > q} channel coding theorem |22j : 

Theorem 2 (Winter Theorem 10) For allrj, e,S E (0, 1), sufficiently large n > 7ii(|A'|, d, rj, e, S) 
and every subset A € A"" with Pr{x" G ,4} > 77, there exists an (n, e) channel code C of rate 
r > I{X; Q)-S satisfying C C A. 

The C C A condition is sufficiently strong to easily yield the achievabihty part of the CQSW 
theorem, following a standard classical argument of Csiszar and Korner |23| . 
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Figure 2: A simple counting argument for the optimal CQSW rate. 



Proof of Theorem 1 (coding) Fixing < e < ^ and 6 > we shall first show that for 
sufficiently large n there exists a family of disjoint channel codes {Ci,C2, . . . ,Cm-i} such that 

M-l 

Pr(a;" ^ IJ ^ 

i=l 

and i logM < H(X\Q) + 26, thus upper bounding the number of channel codes needed to cover 
most of the high probability sequences. Recall that for n > no{\X\, e, 6) we have Pr(X" e Ts{X)) > 
1 — e. By Theorem 2 we also have that for n > ni{\X\,d,r],e,d) and every subset A G with 
Pr{a;" & A} > e there exists an (n, e) code of rate r > I{X; Q) — 6 satisfying C C A. We 
choose n > max{no,ni} so that both conditions are satisfied. The idea is to keep constructing 
disjoint codes from Tx.s for as long as Theorem 2 allows. Define Ai = Tx.s, and lot Ci C Ai 
be an (n, e) code as specified by Theorem 2. Recursively construct in a similar manner Ci C Ai 
where Ai = Tx,s — [J]=iCj, which will also satisfy the conditions of the theorem as long as 
Pr{a;" G Ai} > e. Suppose the construction stops at i = M, i.e. Pr{a;" e Am} < e • Then we 
have 

M-l 

Pr{x" i \J Ci} = Pr{X" ^ Tx,s} + Pr{x" e Am} < 2e. (5) 

i=l 

On the other hand 

M-l 

2n[H(x)+s] > i^^^^i > J2 \Ci\ > (M- l)2"[^(^=e)-^l, 
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which impUes 

i? - log M < H(X) - I(X] Q) + 25. 
n 

The mapping / is now defined as 

f fx") = < & Ci 

■' ^ ^ [ M otherwise 

The latter case, which signifies an encoding error, happens with probability < 2e by Q. Otherwise, 
Bob performs the POVM corresponding to the code C^(2;-.), which fails to correctly identify a;" with 
probability < e. Therefore the total error probability is bounded Pe < 3e. Finally, Winter's "gentle 
measurement" lemma [7], which states that a POVM with a highly predictable outcome on a given 
state cannot disturb it much, guarantees that the average disturbance A is bounded by \/8e + e. 
The direct coding theorem follows. ■ 

Remark The "gentle measurement" lemma invoked at the end of the proof actually applies 
equally if the measurement acts on one half of a purification of the state ~ a fact we needed in 
the discussion of the Slepian-Wolf theorem after the statement of Theorem 1. 

Finally, we would like to comment on a connection to Winter's measurement compression the- 
orem jH]. Suppose Bob needs to perform a "BB84" measurement given by the operation elements 
{^|0)(0|, 5I— )(— 1} on a quantum system described by the uniform density matrix. 

He would then need 2 classical bits of communication to convey the outcome to Alice. Equivalently, 
he can use 1 bit of shared randomness between him and Alice to decide which of the two measure- 
ments {|0)(0|, |1)(1|} or {|+)(+|, |— )(— 1} he should perform, and send her only 1 bit describing 
the outcome. He has thus perfectly simulated the measurement, replacing 1 bit of communication 
with the weaker resource of 1 bit of shared randomness. 

For a general source-POVM pair (p, A = {A^;}), define the classical system XQhy the ensemble 
{px,p{x)}, given by (^3); in other words XQ embodies the correlations between the measurement 
outcome X (to be sent to Alice) and the reference system Q that purifies the system to be measured. 
In an asymptotic and approximate setting, the measurement A®" is considered well simulated on 
p0n j£ ^j-^g classical-quantum correlations established between Alice and Bob's reference system 
closely resemble n copies of AQ. It was shown in |H] that the optimal classical communication 
and shared randomness rates become I{X; Q) and H{X\Q) respectively. It is not surprising that 
the minimal amount of classical communication required to establish a remote classical-quantum 
correlation is given by the corresponding Holevo information. Achievability of this bound may 
described by a diagram similar to the one depicted in figure 2, with the difference that "PACKING" 
should be replaced by "COVERING" . The idea is to divide the set of typical outcome sequences 
Tx,s into codes Ci,i E [Af], such that {px'^ ■ S Ci} mimics the set of residual states of the 
reference system after performing some measurement A^'^. Thus \Ci\ must be sufficiently large to 
allow 

J2 P{x'')px^ « const. X p®", Vi e [M]. 

Since p*^" and px^ are "almost" uniformly supported on Tq^s and Tq\x,s{x^), respectively, [2^. 
dimension counting arguments again suggest \Ci\ = 2"^^'^'®); moreover M = 2"^'^"^l®) as before. 
A^" is then simulated by randomly choosing one of the AW, as in the BB84 example. 

Coding with side information is a relatively unexplored and potentially rich area of quantum 
information theory. We have presented here an important member of this class of problems, 
providing yet another example of classical Shannon theory generalizing to the quantum domain. 

Our main result can be understood as the translation of one of the two extremal rate points 
of the classical Slepian-Wolf region to a classical-quantum scenario. The main question left open 
is whether one can also translate the other rate point; it might actually be that the rate region in 
the quantum case does not look like figure ^ 
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